Replication and Coordination of State Machines

Learn to replicate state machines with coordination to maintain a fault-tolerant service.

A single state machine will be as fault-tolerant as the node it is running on. Replicating a state machine on multiple nodes can make it tt fault-tolerant. We need all our replicas to behave similarly for successful replication.

Outputs from replicas#

By behaving similarly, we mean producing the same output. All state machines in a group of replicas will produce the same output if the following conditions are satisfied for every replica that runs on a non-faulty node (or processor):

  1. Every replica starts in the same initial state.

  2. Every replica executes the same requests in the same order.

How many replicas?#

Since failures are independent, we will assume that a failure can affect at most one node and (as a result) one state machine. The combined output of an ensemble of replicas resulting from replicating a state machine is the output of its tt fault-tolerant state machine.

If nodes can experience Byzantine failures, then for the replica group to be tt fault-tolerant, the following must be true:

  1. The group must have a minimum of 2t+12t+1 state machine replicas.

  2. The group's output must be the output produced by a majority of the replicas in the group.

As long as the failures are no more than tt failures in a tt fault-tolerant replica group, we can find out the correct output.

Created with Fabric.js 3.6.6
With Byzantine failures, we can have fault tolerance if a majority of replicas work fine. Another way of saying this is that for a total of 2t+1 nodes, no more than t nodes fail. We can see that the replication group's output is correct (green box) because the majority of the replicas are working fine.

1 of 2

Created with Fabric.js 3.6.6
With Byzantine failures, we can have fault tolerance if a majority of replicas work fine. Another way of saying this is that for a total of 2t+1 nodes, no more than t nodes fail. We can see that the replication group's output is correct (green box) because the majority of the replicas are working fine.

2 of 2

Points to ponder

Question 2

Remember how we learned in the first chapter that, in general, we need at least 3f+13f + 1 nodes in the replica group to tolerate ff Byzantine faults. The text before this question says that 2f+12f +1 is the minimum number of nodes required to find the correct value when there can be ff Byzantine faults. Aren’t those two statements in conflict with each other?

Hide Answer

No, those two statements are not in conflict. Under some special conditions, we can have a better bound than 3f+13f + 1.

Here, we will briefly examine the scenarios where we use a non-faulty server to communicate with all the replicas of the state machine. State machines are not allowed to directly communicate with each other. Therefore, faulty nodes can’t collude or spread false information about others.

Under such a case, if there can be at most ff Byzantine faults, then we need ff more votes to cancel out the faulty one. We also need one additional output vote to make sure we get the correct result (that will sum up to 2f+12f+1).

In contrast, the general result of 3f+13f+1 assumes a peer-to-peer-like model, where all replicas need to converge on a value by exchanging plain messages.

2 of 2

If nodes can only experience fail-stop failures, then we need at least t+1t+1 replicas to be t fault-tolerant. Since such nodes only give correct outputs, the output of an ensemble of replicas can be the output of any of its members. With t+1t+1 replicas, tt failures still leave one replica that can provide the correct output. Therefore, the system is tt fault-tolerant.

Created with Fabric.js 3.6.6
With fail-stop failures, we can have fault tolerance even if we have a single replica functioning correctly. Another way of saying this is that for a total of t+1 nodes, no more than t nodes fail.

1 of 2

Created with Fabric.js 3.6.6
The system will only fail if all nodes fail—the next node to fail after t failures will be the last node.

2 of 2

Now that we understand the conditions required for a tt fault-tolerant system, let's explore how to ensure those conditions are satisfied. The first condition requires that all replicas start in the same initial state. If the system is booting up, we can start with a known-good state common for all replicas and then maintain that property. The second condition of executing the same commands in the same order requires replica coordination.

Replica coordination#

Replica coordination means that all replicas receive and process the same sequence of requests. We can ensure replica coordination by ensuring the following two properties:

  1. Agreement: All non-faulty state machine replicas receive every request.

  2. Order: All non-faulty state machine replicas process requests in the same relative order.

The order property ensures that the order of processing of requests relative to each other is the same at every replica. Let's assume we have a replication group of two non-faulty replicas. The first replica received requests BB and CC, and the second replica received requests AA, BB, and CC. While this scenario violates the agreement property, let's set that aside for now. If the first replica processes its request such that it processes BB first and then CC, then it would be valid for the second replica to process its requests in the order AA, BB, CC or BB, CC, AA. It is important that the relative order of BB and CC remains the same throughout both replicas.

Agreement#

To implement the agreement requirement, we can use a designated server that disseminates a value to all replicas in the following manner:

  1. [IC1IC1] All non-faulty replicas agree on the same value.

  2. [IC2IC2] If the designated server is non-faulty, then all non-faulty replicas use its value as the value they agree on.

If requests are disbursed to all replicas in a manner that satisfies the above two requirements, then the agreement requirement is satisfied. The designated server mentioned above can be either of the following two:

  1. A client

  2. A replica in the ensemble that receives the request and sends it to other replicas

When the client is not the designated server, it must confirm that the request is not lost or corrupted by the replica that disburses it.

Order#

We can implement the order requirement by having unique identifiers for requests and ensuring that replicas process requests according to the total order of the unique identifiers. A request is defined as stable at a state machine replica smism_{i} if no request from a correct client with a lower unique identifier can be delivered to smism_{i}. We will satisfy the order requirement if every replica processes stable requests with the smallest unique identifier. This property is called order implementation.

Assigning unique identifiers to requests is crucial in implementing the order requirement. We will explore methods for assigning unique identifiers later.

Special cases#

Notice that the order and agreement properties make no assumptions about clients or commands. While this generality is useful when implementing a universal approach for replication, knowledge of commands can allow us to relax the agreement and order requirements. We will explore a couple of such cases in the following section.

Read-only requests#

For example, we can relax the agreement property for read-only requests and requests that do not change state variables. In these cases, we only need to send such requests to one non-faulty replica, which will provide the correct output since this replica's state is the same as all other non-faulty replicas. The request will not modify the replica's state; its state will remain the same as the state of all other non-faulty replicas. So we do not need to ensure that all requests receive such requests (agreement).

Point to ponder

Question

Is it harder to implement a protocol where we send read-only requests to one replica when Byzantine failures are possible?

Hide Answer

Yes. Byzantine failures are harder to detect, especially when the failed node has not yet exhibited any behavior indicative of a failure. When implementing a protocol for read-only requests with relaxed agreement requirements, one should be mindful of how Byzantine failures can affect results. We might not use the relaxed rule for Byzantine faults.

Commutative requests#

Another example is commutative requests for which we can relax the order property. Commutative requests produce the same output regardless of the order in which a state machine processes them. So if two requests rr, and r′r' are being processed by state machine smsm, then smsm will provide the same output regardless of whether smsm processes rr first and r′r' later or processes r′r' first and rr later. The state machine below has commutative requests:

A tally state machine

The tally state machine calculates which candidate received at least a majority of votes (represented by the global variable MAJ). When a candidate has received a majority of votes, tally sends this choice to SYSTEM. If the total number of clients (voters) is less than twice the majority and every client can vote once, then all requests are commutative and implementing order would be unnecessary. All replicas will produce the same outputs after processing requests in any order.

However, in general, we must adhere to the agreement and order requirements.

What's next?#

In the next lesson, we’ll see how to implement the order requirement using unique identifiers.

State Machines

Ordering Requests: Part I